Transaction / Regular Paper Title
نویسندگان
چکیده
Executing large number of independent jobs or jobs comprising of large number of tasks that perform minimal intertask communication is a common requirement in many domains. Various technologies ranging from classic job schedulers to the latest cloud technologies such as MapReduce can be used to execute these “many-tasks” in parallel. In this paper, we present our experience in applying two cloud technologies Apache Hadoop and Microsoft DryadLINQ to two bioinformatics applications with the above characteristics. The applications are a pairwise Alu sequence alignment application and an EST (Expressed Sequence Tag) sequence assembly program. First, we compare the performance of these cloud technologies using the above applications and also compare them with traditional MPI implementation in one application. Next, we analyze the effect of inhomogeneous data on the scheduling mechanisms of the cloud technologies. Finally, we present a comparison of performance of the cloud technologies under virtual and non-virtual hardware platforms.
منابع مشابه
Transaction / Regular Paper Title
This paper investigates the use of distributed processing on the problem of emotion recognition from physiological sensors using a popular machine learning library on distributed mode. Specifically, we run a random forests classifier on the biosignal-data, which have been pre-processed to form exclusive groups in an unsupervised fashion, on a Cloudera cluster using Mahout. The use of distribute...
متن کاملTransaction / Regular Paper Title
Constructing models of dynamic systems is an important skill in both mathematics and science instruction. However, it has proved difficult to teach. Dragoon is an intelligent tutoring system intended to quickly and effectively teach this important skill. This paper describes Dragoon and an evaluation of it. The evaluation randomly assigned students in a university class to either Dragoon or bas...
متن کاملTransaction / Regular Paper Title
Approximate Nearest Neighbor (ANN) search has become a popular approach for performing fast and efficient retrieval on very large-scale datasets in recent years, as the size and dimension of data grow continuously. In this paper, we propose a novel vector quantization method for ANN search which enables faster and more accurate retrieval on publicly available datasets. We define vector quantiza...
متن کاملTransaction / Regular Paper Title
Valid-time indeterminacy is “don’t know when” indeterminacy, coping with cases in which one does not exactly know when a fact holds in the modeled reality. In this paper, we first propose a reference representation (data model and algebra) in which all possible temporal scenarios induced by valid-time indeterminacy can be extensionally modeled. We then specify a family of sixteen more compact r...
متن کاملTransaction / Regular Paper Title
With the shifting focus of organizations and governments towards digitization of academic and technical documents, there has been an increasing need to use this reserve of scholarly documents for developing applications that can facilitate and aid in better management of research. In addition to this, the evolving nature of research problems has made them essentially interdisciplinary. As a res...
متن کاملA First Step Towards Implementing Dynamic Algebraic Dependencies
We present a class of dynamic constraints (DADS) which are of practical interest and allows one to express restrictions such as if some property holds now, then in the past some other property should have been true. The paper investigates in a constructive manner the definition of transaction-based specifications equivalent to DAD-constraint-based specifications. Our study shows the limitation ...
متن کامل